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Abstract 

Complex biological processes are usually experimented along time among a 
collection of individuals. Longitudinal data are then available and the statisti¬ 
cal challenge is to better understand the underlying biological mechanisms. The 
standard statistical approach is mixed-effects model, with regression functions 
that are now highly-developed to describe precisely the biological processes (solu¬ 
tions of multi-dimensional ordinary differential equations or of partial differential 
equation). When there is no analytical solution, a classical estimation approach 
relies on the coupling of a stochastic version of the EM algorithm (SAEM) with 
a MCMC algorithm. This procedure needs many evaluations of the regression 
function which is clearly prohibitive when a time-consuming solver is used for 
computing it. In this work a meta-model relying on a Gaussian process emulator 
is proposed to replace this regression function. The new source of uncertainty 
due to this approximation can be incorporated in the model which leads to what 
is called a mixed meta-model. A control on the distance between the maximum 
likelihood estimates in this mixed meta-model and the maximum likelihood esti¬ 
mates obtained with the exact mixed model is guaranteed. Eventually, numerical 
simulations are performed to illustrate the efficiency of this approach. 

Keywords: Mixed models. Stochastic EM algorithm, MCMC methods, 

Gaussian Process emulator. 


1 Introduction 


Mixed-effects model methodology ( Pinheiro and Bated . 2000l) allows to discriminate be¬ 
tween inter- and intra-subjects variabilities which is essential when dealing with lon¬ 
gitudinal data. Statistical methods for mixed models are now well established (see 
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references below) but can be time consuming depending on the complexity of the re¬ 
gression functions. Indeed sophisticated mathematical models have been developed to 
describe pr ecisely biological processes: multi-dimensional ordinary d i fferential equations 


describe pr ecisely biological processes: multi-dimensionai ordinary d i tterential equations 
(ODE) Isee lWii et al. . 20051 : lOiiedi et al. . 2007 : Layielle et ah . 2011 : Ribba et al.l . 20121 
for modeling yiral load dec rease in HIV patien t s or tumor growth) or partial differ¬ 
ential equation (PDE) (see iGrenier et al. , 2014 ; Chatteriee et al. , 20121 for modeling 
tumor growth or HVC yiral kinetic). These mathematical models haye no analytical 
solution and only an approximate solution can be obtained with computationally in- 
tensiye numerical methods. This induces a huge in crease of the comput ation cost of 
the estimation method (up to 23 days according to Grenier et al.l . l2014h . Therefore, 
there is a crucial need to deyelop new statistical approaches to reduce the computation 
time. The significant computation time of mixed models estimation methods is due 
to their iteratiye settings, compulsory to sides tep the intractability of t he likelihood. 
This is true for methods ba sed on linearisation ( Phiheiroand^at3 2000|) or likelihood 
numerical approximation ( Dayidian and Giltinanl . Il995 : Wolfinger . Il993 ) and t his is 
crucial for EM-type meth ods such as stochastic EM algorithms ( Wei and Tanner , 1990l : 
Kuhn and Layielle , 2005 1. 


Our objective is t o propose a way of reduc ing the computation time of the SAEM- 
MCMC algorithm ( Kuhn and Layielle . l2005l l for complex mixed models, together with 
a theoretical study of the resulting estimator. When the regression function is not 
analytically ayailable (and we call it also computer model in the rest of the p aper), 
extensions of SAEM haye already been proposed. Donnet and Samson ( 2007ll deal 
with the case of an ODE mixed model, approximating the solution with a numerical 
scheme and studying the influence of this scheme to the properties of the approximate 
maximum likelihood estimator. But this approach rem ains time consuming when the 
ODE is multi-dimensional. For a PDE mixed model, iGrenier et al.l l|2014^ propose 
to approximate the PDE with a numerical scheme on a predefined grid, and then to 
interpolate the solution of the PDE linearly between two points of the grid. This linear 
approximation allows substantially reducing the computation time from 23 days to 
around 30 minutes, but may lead to biased estimates depending on the non linearity of 
the model. 

The keystone of proyiding an efficient estimation method with good statistical proper¬ 
ties is the choice of the procedure approximating the regression function. More accurate 
surrogates of computer model than linear approximation rely on Gaussian process em¬ 
ulation which consists of modeling the computer model as the reali zation of a Gaussian 
process ( Sacks et al.l . Il989t Santner et al. , 2003t Fang et 2005 ). This technique is 

also known as Kriging. A cheap approximation of the model, the emulator, is obtained 
by conditioning the Gaussian process on some eyaluations of the model corresponding to 
inputs of a well-chosen design of numerical experiments. This stochastic modeling of the 
computer model proyides also a measure of uncertainty on the precision of the approxi¬ 
mation as a supplementary yariance-coyariance function which can be integrated in the 
mixed model. This ap proach has been already coupl ed with a Stoch astic EM algorithm 
( Barbillon et al.l . |201lll or with a Bayesian procedure ( Fu et al. . 2014 1 in regression mod¬ 
els without random effects. In this paper, we propose to couple the SAEM algorithm 
with this Gaussian process emulator, incorporating this new source of uncertainty due 
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to the approximation. Thus, providing a conhdence interval of the unknown parame¬ 
ters takes into account the error induced by the approximation. We will refer to this 
approach as the complete mixed meta-model. However, the supplementary variance- 
covariance function in the model induces a loss of independence of the observations 
obtained from the different subjects which increases the computational burdensome of 
the MCMC scheme. That is why we also propose two simplified versions: the first 
one (called intermediate) by considering a diagonal variance-covariance function of the 
approximation error; the second one (called simple) by only using the approximation 
of the computer model and not incorporating the variance-covariance function. These 
two last versions allow to assume the independence between the subjects, and to reduce 
signihcantly the computational cost. 


The Gaussian process emulator can also be interpreted as an approxim ation of th e com¬ 
puter model by kernel interpolation with radial basis function as in ISchabac^ ( 1995 , 


2007|). In this framework, a point-wise control on the error of approximation is pro¬ 


vided. Hence, we are able to guarantee a control on the distance between the maximum 
likelihood estimates in the approximate mixed meta-models and the maximum likeli¬ 
hood estimates obtained with the exact computer model. This control is decreasing to 
zero as a function of the space-fillingness of the design of numerical experiments. 


The paper is organized as follows. Section [2] introduces the standard non-linear mixed 
model and Section |3] recalls the principles and the main results of the Gaussian process 
emulation. Section 2] introduces three mixed models approximated by Gaussian process 
emulator. In Section [5l three versions of the SAEM algorithm coupled to a Gaussian 
process emulator are proposed. Theoretical results are given in Section [6l A simulation 
study illustrates these results (Section [7]). Section [8] concludes the paper with some 
extensions. Proofs are gathered in Appendix. 


2 Mixed model and notations 

Let us define = *(j/ii,... ,yini) where yij € is the response for individual i at 
time tij, i = 1,..., N, j = and let y = (yi,...,yvr) be the vector of all 

observations, of size ntot = assume that the individual vectors y^ are 

described by a non-linear mixed model, defined as follows, for j = 1,..., 

yij — /(Lj 5 Ue £, Sij A/'(0, 1) , (1) 

'02 ^iid A/'(p, H), 

where /(•,•) : R x is the regression function, ipi is a d-vector of individual 

parameters. The £i = {sn,... jeimY represents the Gaussian centered residual error, 
independent of ipi. The individual parameter ipi are assumed to be random, Gaussian 
with expectation y and d x d covariance matrix H. Note that the individual vectors 
(yi)i are independent and identically distributed. 

The quantities we want to estimate from the observations y are the population pa¬ 
rameters 9 = (/i,H,cr^). In the following, we restrict to the case of scalar observations 
(p = 1) to ease the reading, but the extension to a multidimensional observation with 
p > 1 is straightforward. 
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We want to estimate 6 by maximum likelihood. The likelihood of model o is: 



where = {tn ,..., and 

f(ti,'0i) = When / is non linear with respect to ip, the 

maximum likelihood estimator has no closed form. Any estimation method adapted 
to non-linear mixed models would require a very large number of evaluations of /, 
which could be time consuming when the structural function / is a computer model. 
Therefore, there is a real need to consider approximations of the function / that are 
simple to evaluate at any point For that purpose, we introduce the framework 

of meta-model in the next section. 


3 Met a-mo del 


We start with the point of view of conditioned Gaussian process emulation which has 
the nice feature of incorporating as a variance-covariance function the additional source 
of uncertainty due to the approximation. This will naturally lead to a mixed meta¬ 
model on which the SAEM-MCMC can be performed. We also link this framework to 
the kernel interpolation framework since we need the deterministic point-wise control 
on the approximation to obtain the control between the maximum likelihood estimates 
corresponding to the exact model and to its approximation. 

3.1 Conditioned Gaussian process 

In this framework, the function / is interpreted as a realization of a Gaussian Process. 
Let us denote Fx a Gaussian process dehned, for any x = as 


L 



(3) 


where hi,... ,hL are regression functions, H = {hi,..., hj^), /3 = (/3i,..., /3i) is a vector 
of parameters, C is a centered Gaussian process with covariance function 


Gov(C(x), C{x')) = a'^K^ix, x'), 


where is a correlation function depending on some parameters cp and A = (/3, cr, (p) 
is the vector of all unknown parameters. For instance, the so-called Gaussian kernel 
is dehned by K^{x,x') = exp(—— x'\\^). The regression functions hi,... ,hL are 
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usually linear functions or low degree polynomials. The kernel has to be chosen with 
respect to the assumed regularity of the function /. Similarly, the regression functions 
H have to be chosen with respect to t he supposed trend in the f unction f i f some 
insights on the function are available. See Koehler and Ow^ (199^; Fang et al. ( 2005 1 
for detailed discussions on the choice of the regression functions and kernels. 

We assume that we are able, in a pre-computation step, to evaluate precisely the func¬ 
tion f nu times for a given design of numerical experiments, D = {xi,... Xno}- These 
"exact" evaluations are denoted zi = f{xi),...,Zn = /(xud)- The {zk) are different 
from the [ijij) considered in model ([T|) which are noisy observations of / in some un¬ 
known points x = The design of expe riments D is usually chosen with respect 

to a space-filling criterion ( Fang et al.l . l2005^ in a bounded domain where the points 
are assumed to be. For a given kernel K, and a given vector H of regression 
functions, the vector of parameters A has to be estimated. Usually, A is estimated by 
maximizing the log-likelihood ip oi the Gaussian process F (|3]): 


£_f(A;z£)) = 


-hog{{27:a^r-\EU\) 

- HDf3){a^EU)-\zD - Hd(5) , 


(4) 


where 

1 <k.j<n n = (K^(xk,X i)), {HD)i<k<nD,i<j<L = hj{xk). The Matlab tool¬ 
box DACE ( Lophaven et al.l . l2002h provides an optimization algorithm to estimate di¬ 
rectly A. We denote by A = a, (p) the estimates. The Gaussian process is chosen to 
he F = It corresponds to a plug-in approach since from now, these parameters are 
considered as known. 

Then / is not directly approximated by K, but rather by the conditional process denoted 
F^, defined as the process F conditionally to F{xi) = zi,..., F{xno) = in short 
Zd = zp). The process F^ is still a Gaussian process, defined by its mean and covariance 
functions, which can be exactly computed. Let us introduce the partial functions, for 
any x € Kf : —>■ M defined by K^{x') = K^{x,x') for any x' and the 


vector = 


{Ki{xu))^ 


l<fc<ni5 

the process F^ are defined, for all x^x' as 


Then, the mean mD{x) and covariance Cd{x,x') of 


moix) = H{xYf3 + - Hd/B) 

CDix,x') = a^Ktix') - 


(5) 

( 6 ) 


The mean function mp) provides an approximation of the function / for any x and 
the variance function x i--> Cd{x,x) measures the confidence in the accuracy of this 
approximation. The plug-in approach for A = may lead to underestimate 

the uncertainty on the quality of the ap proximation which is showed to be asymptoti- 


tne uncertainty on tne quality ot tne ap proximation wnicn is snowed to be asymptoti¬ 
cally negligible I Prasad and Baol . 199(t) . It can be used only for the parameter of the 
correlation kernel cp. In this case, when the process is not conditioned on the 

conditioned process is a Student T-process with still closed-form location and scale 
( Santner et ah . 200,‘lh . However, we prefer the complete plug-in approach to deal with 
a Gaussian distribution which is easier to incorporate in the mixed model. 
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3.2 Kernel interpolation 

We can interpret the previous meta-model approximation as a kernel interpolation. In¬ 
deed, a Reproducing Kernel Hilbert Space (RKHS) can be associated to K'^ as soon as 
the kernel is positive definite. Then the partial functions defined in Section [01 
span a pre-Hilbert space with inner product < K^, >= K^{x, x'). Aronszajn’s the¬ 

orem states that there exists a unique completion T-Lx of this space with the reproducing 
property: 

Vti e Hk, X e v{x) =< v,K^ > . 

The space Hk is the RKHS associated to kernel K. Then, we focus on the function 
g{x) = f{x) — ^H{x)f3, where f3 is estimated as before. Note that this function can be 
seen as the residuals of the linear model ([3]) of the random variables Zi,..., Zn^ on the 
design D = {a:i,. ..Xud}' 


Zi = f(xi)= *H{xi)^ + g{xi). 

We consider the following problem of seeking for the function in Hx which interpolates 
g on points of D with a minimal norm: 

f \Mhk 

\ g{xk)=v{xk), k=l,...nD. 

The solution to this problem is the orthogonal projection of g on the subspace spanned 
by denoted sx,D{g)- R we assume that the function g = f — *Hf3 

belongs to Hx, then sx,D{g) is defined as 

SKMg{^)) = (7) 

no 

= *u{x){zd - Hd^) = ^ Uk{x)g{xk), 

fc=i 


with *u(a:) = ^ (same notations as in Section 13.111 . This provides an 

approximation of / which is the same than the function mjj The approximation 
rao belongs to the RKHS assuming that the regression functions H belong to the RKHS, 
which is true for linear or low degree polynomial regressors H. 

The kernel interpolation framework yields to an upper bound on the point-wise error of 
this approximation using the reproducing property and a Cauchy-Schwarz inequality. 
For any x, we have 


\f{x)-mD{x)\ = 


< 


< 


\g{x) - sx,D{g{x))\ 
no 

\<g,Ki-Y,Mx)Ki>\ 


riD 

h\\n^-\\Kt-Y.^k{x)Kt^\\n^ 
WgWuKPoix). 


( 8 ) 
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The norm Ugll-HK is unknown and depends on /. The norm Pd(x) does not depend on 
/ (or g) but on the design of experiments D only. It holds that 

Pn{x) = [K^x^x) - = ^Cd[x,x) . 

Again, there exists a link with the Gaussian process framework: up to the parameter 
we obtain the variance function of the conditioned Gaussian process ([5]). 

For some usual kernels, a uniform upper-bound on Pd{x) is available as a function of 


ao = sup mm \\x — Xk\ 

l<fc<ni5 


where A is a bounded subspace of TZ. The value of od is related to the coverage of the 
space X by the design of experi ments. A design of e xperiments which minimizes this 
quantity is said to be minimax ( Johnson et ah . 1990ll . The point-wise upper bound is 
given in the following Proposition ( Schabac^ 1995 1: 


Proposition 1. Assume that the experimental design D is minimax in X. Let Pk 
denote the RKHS associated to the kernel K which i s assum ed to be derived from a 
radial basis function as proposed in Wu and Schaback 1(199^} . Assume that f lies in 
Pk ■ Let m d denote the kernel approximation of the function f obtained on the design 
D. Then the point-wise error \f{x) — m£){x)\ is uniformly upper-bounded in X by 


\f{x) - mD{x)\ < WgWnKPoix) < ||g||-H^Gic(aD). 
where the function Gk is defined on R’*' and is such that lim GkM = 0. 
Furthermore, if the regressors H G Pkj then there exists a constant G > 0 such that 


\f{x) - mD{x)\ < C\\f\\-HKGK{aD) ■ 

For instance, when using a Gaussian kernel K^{x, x') = H , the function Gk is 

Gk{o) = Ge~^l°' where G and <5 are constants depending on if. 


4 Three mixed meta-models 

The estimation of the population parameter 6 is performed on the meta-model approx¬ 
imation of the mixed model ([T]). For computational reasons, we introduce three mixed 
meta-models. 


4.1 Complete mixed meta-model 


Let us introduce the so-called complete mixed meta-model that integrates the meta¬ 
model approximation. The regression function / in ([I]) is approximated by F£i{t,'ip) = 
mD{t,ip) -\-r{t,ip): 


Vi] 

— ^ij 1 ^ij '^iid 


^iid A/"(/J-, ^2), 

Fait, Ip) 

= with, 

r{t,'4)) 



AA(0,1), 


(9) 
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Whereas model CD is homoscedastic (constant error variance), the mixed metamodel 
([HD is heteroscedastic. Let us emphasize that this is not a standard heteroscedastic error 
model. Indeed, we have: 


Vij I "02 ^ -A/” {rriD {tij 5 02 ) ; r^) {t^j , 00) 


with 


rD(t,0) = CTe + CD{t,'lp\t,'ljj) , 


but the {yij\'>pi)j are not independent^ as well as the individual vectors (yi)i. This is 
due to the fact that the (ritij,ipi))ij are realizations of the same Gaussian process. 
This is a major difference with approximations that have already been proposed in the 
literature ( Donnet and SamsonL 120071 : iGrenier et ah . 2014h . Especially, this complicates 
the implementation of the MCMC scheme. 

We propose to estimate 9 as the maximum of the likelihood of model ([9D . We denote 


; 00) l<2<Ar,l<j/<n2 

the vector of the approximate mean, evaluated on (t,0) = {Uj Simi¬ 
larly, we denote 


Di.l'ij : ^ li'<N p<jj'<r. 

The likelihood of model ([9D is then: 

1 


PD{y]0) = j |p(0;0) 


(27r)"'“>‘/2|cr| -f CD(t, 0)|i/2 


( 10 ) 


1 


exp ( - ^*(y-mD(t,0))(crf/„,„, -kC£,(t,0)) 0y - m£,(t, 0)) ) d0 [>. 


This likelihood is not explicit because function mD(tij,00 is not linear in 0^. As said 
previously, this likelihood cannot be simplified as a product of individual likelihoods 
because the y^ are not independent (the matrix C£)(t,0) is a full matrix). The corre¬ 
sponding estimation algorithm requires to invert this ntot x ntot-nratrix at each iteration 
(at least N x 2d per iteration), which is highly computationally intensive. Therefore, 
we introduce an intermediate mixed meta-model by considering only the diagonal of 
Cz5(t,0). 


4.2 Intermediate mixed meta-model 


In the intermediate mixed meta-model, the regression function / is approximated by 
mD(t, 0)-|-f(t, 0), where f(t,0) has a diagonal covariance matrix = diag(CD(ti, 00): 


Vij 

= mD{Uj,tpi) + r{tij,tl;i)+ae£ij, 

£ij 

^ud A/'(0,1), 

02 

^iid A/”(p, n) 


^ind I?T’(0,Ai,^, = diag(CD(0,00)), 








The likelihood of model m is then: 


PD{y,d) 


n/ 



1 

(27r)”-/2 

- TTl£} (t^, '0j))(fTg ) (y^ Tn£) (t^, 


( 12 ) 



This form of the likelihood is separable with respect to ^pi and can be written as a 
product over the individuals which are independent. The covariance matrix (T^Im + Ai_^^ 
is diagonal and can be easily inverted. This will substantially reduce the computation 
time of the estimation method. However the intermediate model is heteroscedastic, 
and (Te might be more difficult to estimate than in the exact model. This is why we 
introduce a simpler mixed meta-model. 


4.3 Simple mixed meta-model 

The simple mixed meta-model neglects the error of approximation of the computer 
model. The regression function is then 

yij — T ^ij '^iid A/'(0, 1), (t^) 

"02 ^iid A/'(/r, H). 

The simple mixed meta-model (1131) has similar properties than model (HD: it is ho- 
moscedastic (constant error variance), the vectors {yi)i are independent and identically 
distributed, and for each individual i, conditionally to ipi, the {yij)j are independent. 
The likelihood of model (fT51) is given by: 

pD{y;0) = n/ |p(^"^) (27rag)»^/2 

exp i‘(yi - mDiti,'ipi))ia‘^Ini)~^iyi - mD{ti,i>i))^d'ilJi^, 
which has the same form than likelihood of model ©• 


5 Population parameter estimation 

Likelihoods of the mixed meta-models being not explicit, we resort to the family of EM 
algorithm to estimate the parameters 0, which is a classical approach for models with 
non-observed or incomplete data. We start with the SAEM algorithm for the exact 
mixed model and then for the three mixed meta-models. 


5.1 Estimation for the exact mixed model 


The objective is to maximize the li kelihood p(y; 9) of the exact mixed model ([T]). Let 


us briefly cover the EM principle (iDempster et al.l . 1197711 . The complete data of the 
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mixed model is (y, xp). The EM algorithm maximizes the Q{0\0') = ]E(L(y, rp; 0)|y; 0') 
function in 2 steps, where L{y,tp;0) is the log-likelihood of the complete data for the 
mixed model m and E is the expectation under the conditional distribution p{ip\y', O'). 
At the fc-th iteration, the E step is the evaluation of QkiO) = whereas 

the M step updates by maximizing Qk{0). For cases with a non analytic E 

step, Delvon et al. (199^ introduce a stochastic version SAEM of the EM algorithm 
which evaluates the integral QkiO) by a stochastic approximation procedure. The E 
step is then divided into a simulation step (S step) of the missing data under the 
conditional distribution pixp\y,0^^~^'>) and a stochastic approximation step (SA step) 
of the conditional expectation, using {"fk)k>o a sequence of positive numbers decreasing 
to 0: 

QkiO) = Qk-iiO) +7kiLiy,xP^^'^;0) - Qk-iiO)). 

In cases where the s i mulat ion of the non-observed vector ip cannot be directly performed, 
Kuhn and Laviell^ ( 2005l l propose to combine the SAEM algorithm with a Markov 
Chain Monte Carlo (MCMC) procedure. The idea is to simulate a Markov chain 
by use of a Metropolis-Hastings (M-H) algorithm with piip\y,0^^~^'>) as the unique 
stationary distribution. 

The complete data likelihood T(y, ip; 0) of the exact mixed model belongs to the regular 
curved exponential family: 


Piy^ip'^O) = exp{-i/(6»)-K < Siip),Xi0) >} 

where < •,• > denotes the scalar product, the minimal sufficient statistic Siy,ip) 
takes its values in an open subset S of R™, vd and A are functions of 0. Then the 
SA step reduces to approximate E Siy, ip)\0^'^~^'> at each iteration by the value Sk- 
The sufficient statistics for the exact mixed model are classically S'i(y, ip) = V”*! 

5'2(y, Ip) = i’i V* and 


Saiy,Ip) = J2^=iJ2'iLiiyij - fiUj.ipi))'^ dSamson et al.|. |2007l) . Then the M step is 


explicit and easy to implement. The convergence of the SAEM-MCMC algorithm has 
been proved when the complete data likelihood belongs to the regular curved exponen¬ 
tial family and under additional assumptions (see Proposition [2|). Thus the exponential 
family plays a crucial role to obtain an efficient algorithm. 


5.2 Estimation for the simple mixed meta-model 

The objective is to maximize the likelihood PDiy;0) of the simple mixed meta-model 
(1131) . In the following, all the quantities referring to this approximate likelihoodp£)(y; 0) 
are indexed by D with a tilde symbol. The corresponding complete data likelihood 
LDiy,ip;d) belongs to the regular curved exponential family with minimal sufficient 
statistics Soiy^ip): which are the same as the exact mixed model. In that model, the 
MCMC algorithm is easy to implement because of the independence of the observations 
of the individuals. More precisely, the SAEM-MCMC is described as follows. 

Algorithm 1. (SAEM-MCMC algorithm for the simple mixed meta-model) 

At iteration k, given the current values of the estimators 
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(k) 

Simulation step: For each individual i separately and successively, update 
with m iterations of an MCMC procedure with PD{'4’i\yi',d^^~^'^) as stationary 
distribution: 

For I = 1... ,m, given a current value fof individual i: 

— Simulate a candidate ■i/'f with a proposal distribution 

— Meta-model step: Evaluate the meta-model 
mD{tij,'il)^) for all j = 1,... ,nt. 

— The candidate is accepted, %l}\ = iff, with probability otherwise 

the candidate is rejected, Tp\ = with probability 1 — where 

adwA .Wa ) = mm - , ^ ^^ ^ -r-j—, 1 • 

Set V'f ^ = V'z’"- 

Stochastic Approximation step: update the sufficient statistics: 


Sk,i = Sfe-i,i+7fc ~ gfc-i,!^, 

/ N \ 

Sk,2 = Sk-1,2 + 7fc ( X] “ Sk-1,2 


Sk,3 = Sfe-l,3+7fc EE(^. - mD{tij,ll)f^)f - Sfe_i,3 
\i=l j=l 

Maximisation step: update the population parameters 


T 


(fc) ^ fM ^(fc) ^ £M _ 


Sfc,2 Sfel Sfc 1 


. Sfe,3 


'e — 


N ’ - N m ■ ntot 

5.3 Estimation for the intermediate mixed meta-model 


In the following, all the quantities referring to the approximate likelihood PD{y\d) of 
the intermediate mixed meta-model are indexed by D with a bar symbol. 

This model belongs to the exponential family when the Gaussian process f is considered 
in the hidden states. Then the complete data of the intermediate mixed meta-model 
are {y,xf),f) where f = (f(ty,j=i,...,ni ■ The complete log-likelihood is thus: 


Lo(y,'0,r;6») 


logPD(y|r, Ip-, d) + logp£,(r|-0; 9) -f \ogp{ip-, 9) 


, '^tot / 2 

cst - — log(crg 


1 (yij -mD{tij,fi) -r(tij,tpi))'^ 

’ 0 2 ^ n-2 


i i 

E - p) , 

i 
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where cst denotes a constant term. The E-step is the computation of 
Q(0|0('=-i)) = E(L,3(y,t/,,f;0)|y; 

logPD(y, r, Ip; 0)pD{r\y, ip; 0^^~^'>)dv\ pD('0|y; 0‘^^~^'>)dxp . 


The conditional distribution PD{ji\yi,'>Pi;(^^^ is explicit, Gaussian, with mean and 
covariance defined by 

dfc-i) 


- (fc-i) 

p(/c—1) 


= r 


(fe-i) 


Vi (yi-mDi4’i))/<yj 


pk-1) 


- r,iPi — (l/o"! + ■ 


Integrated with respect to r inside Q{0\0^^ yields to 
Q(0|0('=-1)) = ‘ 


F - 2 E-- 




2^1 


1 t^(fc-l)A-l ^('=-1) 1 






-^iog{\n\)-^Y^\tPi-p)n 

i 

p{ip\y; 0^^~^^)d%p -\- cst. 


Then the sufficient statistic corresponding to cr^ is changed to (y, "0, r) = J2f=i l|yj~ 

”T-D('0i) ~ IP + The simulation step is a standard one, which can be 

applied to each individual separately. The MCMC algorithm targets _PD(t/’i|yi; 
as stationary distribution, where the process has been integrated out. The accep¬ 
tance probability only requires the knowledge of PD(yi|V'ij which is a Gaussian 

density with covariance matrix cr^Im + As this matrix is diagonal, its inversion 

at each iteration is fast. Finally, the SAEM-MGMG proceeds as follows: 


Algorithm 2. (SAEM-MCMC algorithm for the intermediate mixed meta¬ 
model) 

At iteration k, given the current values of the estimators 

ik^ 

S step: For each individual i separately and successively, update ipi with m 
iterations of an MCMC procedure with pD{'ipi\yi] as stationary distribution. 


SA step: update the sufficient statistics Skp and Sk ,2 os usual and update 
Sk,3 = Sk-i,3 + yk (^'^llyi-mDifJi)+Fr - Sk-1,3^ 
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M step: as usual. 


5.4 Estimation for the complete mixed meta-model 

In the following, all the quantities referring to the approximate likelihood PD{y\d) of 
the complete mixed meta-model (jO]) are indexed by D. 

The main difficulty comes from the fact that model dH]) is heteroscedastic but not in 
a standard way: the conditional distributions of yi\tjji are not independent and all the 
subjects have to be treated together. Similarly as the intermediate model, we consider 
the Gaussian process r in the complete data and we have to integrate out with respect to 
r to compute the function Q. The conditional distribution p£)(r|y, ip-, is explicit, 

Gaussian, with mean and covariance defined by: 

The matrix T^ ^ ' has dimension ritot x ntot and cannot be split as it was the case with 
the intermediate model. Thus the inversion of Ti- and Cd increases dramatically the 
computation time of the estimation algorithm. 

Moreover, the MCMC step is also more complex. Indeed, the conditional distributions 
Poitply) cannot be written as a product of individual conditional distributions. But the 
MGMG kernels are applied to each subject i successively. The corresponding target dis¬ 
tribution is the conditional distribution p£)('0i|y, ip_P) where ip_^ = {ipi, ■ ■ ■, ipi-i, ipi+i, ■ 
is the vector of individual parameters except individual i (with obvious notations when 
i = 1 OT i = N) and not the distribution poi'fpilyi) as in a standard heteroscedastic 
mixed model. This increases the difficulty of implementation of the MGMG: the whole 
covariance function C£)(t, ip), evaluated at each point {Uj, ipi), has to be evaluated and 
inverted at each iteration of the MCMC scheme. 


Algorithm 3. (SAEM-MCMC algorithm for the complete mixed meta-model) 

At iteration k, given the current values of the estimators 


S step: for each individual i successively, given the current values ip^^} = i • ■ • ) 
l/’i+i • I other individuals, update ip^^'^ with m iterations 

of an MCMC procedure with pD{ipi\y,ip^!pl-,0^^~^^) as stationary distribution: 

For I = 1... ,m, given a current value ip\~^ for individual i and a current vector 
= {ip ^^\..., 1 /o’’ individuals: 


Simulate a candidate ipf with a proposal distribution ^)- 

Set ip^ = {iP^^\ ..., ip^^\,ipt, ip^^+i ^'^,..., '>Pm~^^)- 

Meta-model step: For all j = evaluate the meta-model m niUj, ip ^). 

For all subjects i',i" = 1,...,7V (including subject i) and all observations 
j', j", evaluate the covariance functions CD{ti'j',ip’i'',tiiijii,ip’p,,) and invert 
the obtained matrix Cd- 


■ ) V'Af) 
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The candidate is aecepted, = '4’i> with probability ^); otherwise 

il)\ = with probability 1 — where 


= min 




Set 

SA step: update the sujficient statistics Sk,i and Sk ,2 os before and update: 
Sk,3 = Sk- 1,3 + 7fc (\\y - moiip) - f + Tr - Sk- 1 , 3 ^ ■ 


M step: as usual. 

Let us emphasize that the MCMC in the S step is difhcult to implement due to the 
heteroscedasticity of the complete mixed meta-model. This MCMC algorithm may have 
poor mixing properties because the vectors ipi are updated successively while they are 
highly correlated through this non-diagonal matrix Coit^xf). Another solution could 
be to design a proposal in the MCMC algorithm for the whole vector xjj. However, such 
a proposal is quite complicated to construct since the dimension of xj) is high: d x N. 


5.5 Fisher Information matrix estimates 

Using formula in Louis (1982) and estimation scheme proposed in lPelvon et al. ( 19991) . 
confidence intervals can be obtained on the parameters implementing a stochastic ap¬ 
proximation scheme of the Fisher Information matrix. It is only necessary to approxi¬ 
mate the gradient and the Hessian matrix of the log-likelihood of the complete data: 


\ogp{xf, y; e) = logp(y|rA; 6 ) -t- \ogp{xp-, 6 ). (15) 

Actually, \ogp{y\xp;9) does not depend on /i et H, hence the gradient and Hessian 
computations are only about logp{xp; 6 ) which is a multivariate normal M{p,, H). Thus, 
this implementation does not depend on the mixed model and remains the same for the 
standard mixed model and the three mixed meta-models. 


6 Convergence of the SAEM algorithm to the maxi¬ 
mum likelihood of the meta-model 

Since the SAEM-MCMC algorithm is not applied to model ([1]), but to an approximate 
mixed model, it is not possible to prove the convergence of the algorithm toward a local 
m aximum of the exac t likel ihood p{y, 6 ). However, it is possible to apply the results 
of Kuhn and LavielI3 ( 2005ll for the three mixed meta-models. Hence, the algorithms 
converge toward a local maximum of the likelihood pD{y;9), PD{y\&) and PD{y\&) 
when applied to the complete, interme diate or the simple mixed meta-models ([^ , dill) 
and (TT^ . respectively. This is given bv iKuhn and Laviellel (200^ that we briefly recall, 
without detailing their assumptions (M1)-(M5) and (SAEM1)-(SAEM4). 
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Proposition 2 (Kuhn and Lavielle). Under assumptions (M1)-(M5) and (SAEMl)- 
(SAEM4) for the complete, intermediate or simple mixed meta-model, if the sequence 
(sk) stays in a compact set, the SAEM algorithm produces a sequence which 

converges to the (local) maximum of the approximate likelihood piy(y;0), p_D(y;0) or 
pD{y',0), respectively. 

Now we study the impact of the meta-model approximations on the likelihoods. Our 
goal is to obtain a uniform control on the distance between the likelihood of the exact 
model p{y;6) and the likelihoods of the three mixed meta-models pu{y;6), PD(y;0) 
and pD{y,0) as a function of the quality of the meta-model. We start by the simple 
mixed meta-model. 


Proposition 3. Let us consider the likelihoods p(y; 0) ^ of the mixed model {Ip and 
PD{y; 0) 0 of the simple mixed meta-model hi A) associated to a minimax design D. 
Assume that the support of the distribution of tp is compact. Assume that the functions 
f and mo are uniformly bounded on the support of the distribution of ip. Assume that 
f lies in the RKHS associated with the kernel K satisfying to the same hypotheses as 
in Proposition QJ Then, there exists a constant Cy which depends only on y such that 

\piy,0) -PDiy;0)\ < Cy-^^^GKiao) 

O'e 

where the function Gk tends to 0 when a ^ 0 (defined in Proposition QP and the 
constant ao is the covering distance of the design of experiments D. 

Recall that, when using a Gaussian kernel K{x,x') for the meta-model approximation, 
the function Gk is defined by GK{a) = Ge~^l°^. Then, to ensure that this covering 
distance is small, we need a global upper-bound, uniformly in tp. This is true when 
the support of the distribution of ip is compact. Under this assumption, we obtain 
that the covering distance Gk{o,d) can be as small as required provided there is a 
sufficient number of points no in the design. Thus providing a rich design D during 
the pre-computation step allows to control as finely as we want the error induced on 
the likelihoods. 

Now, we can study the distance between the three mixed meta-models. 

Proposition 4. Let us consider the likelihoods PD{y,0) Udil of the complete mixed 
meta-model PD{y,0) HM) of the intermediate mixed meta-model il3\) and p]j{y,9) 
of the simple mixed meta-model GS\) associated to a minimax design D. 

Under the same hypotheses as Proposition\^ there exist two constants Gy and Gy which 
depend only on y such that 


\PD{y,d) -PD{y)\ 
\pD{y\0) -PD{y)\ 


< Cy^,GK{ao), 

< Cy^,GK{ao). 


Therefore, this guarantees a control between the likelihood of any of the mixed meta¬ 
model and the likelihood of the exact mixed model. 

With regularity hypotheses on the Hessian matrix of each likelihood, results similar to 
Donnet and SamsonI ( 2007 1 can be obtained: The distance between the maximum of 
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Parameter 


Intermediate 

meta-model 

Simplified 

meta-model 

Exact 

model 


no 

25 

50 

100 

25 

50 

100 



Bias 

-0.508 

-0.025 

0.121 

-0.483 

-0.015 

0.089 

-0.390 

Mlog V 

RMSE 

0.048 

0.043 

0.042 

0.048 

0.042 

0.041 

0.063 


Cov. 

92.8 

93.3 

93.3 

92.1 

93.6 

93.8 

86.3 


Bias 

-4.920 

-1.389 

-0.794 

-4.872 

-1.396 

-0.904 

-2.079 

/Clog ka 

RMSE 

0.860 

0.545 

0.476 

0.870 

0.541 

0.502 

0.797 


Cov. 

79.2 

86.9 

89.1 

77.8 

85.9 

87.4 

81.0 


Bias 

-2.067 

-0.599 

0.014 

-1.930 

-0.577 

-0.138 

-1.566 

/Clog Vm 

RMSE 

0.392 

0.333 

0.314 

0.401 

0.328 

0.327 

0.680 


Cov. 

87.3 

88.1 

88.9 

85.5 

88.9 

89.2 

82.5 


Bias 

4.569 

-2.408 

-2.185 

4.270 

-2.108 

-2.215 

-4.393 

C^log V 

RMSE 

6.487 

5.526 

5.276 

6.359 

5.445 

5.287 

9.461 


Cov. 

94.5 

92.2 

91.9 

94.8 

93.3 

93.5 

84.8 


Bias 

1.755 

-3.822 

-6.797 

1.935 

-5.022 

-6.799 

-1.705 

^logk. 

RMSE 

17.398 

16.382 

16.590 

17.465 

16.305 

16.870 

23.416 


Cov. 

84.4 

82.6 

81.4 

84.9 

81.6 

80.4 

80.0 


Bias 

-33.721 

-30.387 

-30.408 

-33.916 

-29.946 

-30.148 

1.867 

^log 

RMSE 

15.975 

13.236 

13.039 

15.981 

12.914 

13.236 

17.039 


Cov. 

62.6 

65.6 

65.8 

62.3 

69.6 

67.7 

83.8 

^2 

Biais 

2.449 

1.975 

2.337 

5.054 

2.648 

2.450 

0.308 


RMSE 

0.354 

0.302 

0.370 

0.650 

0.426 

0.397 

0.177 


Table 1: Michaelis-Menten pharmacokinetic simulations: relative bias (%), relative 
MSE (%) and coverage rate (%) computed over 1000 simulations, with the intermediate 
meta-, the simple meta- and the exact mixed models. Meta-models are built with either 
n_D = 25, n_D = 50 or no = 100 design points. Coverage rate (Cov.) is the coverage 
rate of the 95% confidence interval based on the stochastic approximation of the Fisher 
matrix. 


the exact likelihood p(y; 0) and the maximum of the approximate likelihoods pniy] 0), 
Paiy", 0) or pD{y\0) can be as small as we want, as soon as the design D is rich enough. 


7 Simulation study 

The objective of this study is to compare the main statistical properties of the estima¬ 
tion with the mixed meta-models and compare them to the exact mixed model. Two 
examples are illustrated below, using standard ODE pharmacokinetics (PK) models. 
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Parameter 


Intermediate 

meta-model 

Simple 

meta-model 

Exact 

model 


no 

50 

100 

50 

100 



Bias 

0.101 

0.007 

-0.320 

0.007 

0.003 

Ailog fee 

RMSE 

0.004 

0.005 

0.005 

0.005 

0.005 


Cov. 

94.2 

94.4 

90.6 

94.6 

93.9 


Bias 

-2.441 

0.001 

-8.380 

0.008 

-0.220 

A^log ka 

RMSE 

0.222 

0.162 

0.910 

0.160 

0.160 


Cov. 

90.9 

95.6 

59.6 

95.3 

95.6 


Bias 

0.388 

0.036 

0.160 

0.036 

-0.004 

Mlog Cl 

RMSE 

0.004 

0.003 

0.003 

0.003 

0.003 


Cov. 

87.6 

95.1 

93.4 

94.7 

94.9 


Bias 

-12.113 

-2.745 

-23.200 

-2.780 

-3.400 

'^log fee 

RMSE 

7.131 

6.404 

9.730 

6.530 

6.460 


Cov. 

83.2 

91.5 

65.7 

90.5 

90.3 


Bias 

-20.485 

-3.442 

20.900 

-3.320 

-2.440 

‘^logfee 

RMSE 

10.696 

5.911 

13.500 

5.930 

6.050 


Cov. 

72.3 

89.7 

96.9 

89.2 

90.2 


Bias 

0.375 

-1.145 

-8.100 

-1.100 

-2.660 

^logC, 

RMSE 

5.944 

5.726 

5.810 

5.690 

5.650 

Cov. 

92.6 

92.0 

87.5 

92.8 

91.1 

^2 

Biais 

-45.262 

-0.612 

16.000 

-0.009 

-0.023 


RMSE 

20.719 

0.232 

2.950 

0.220 

0.220 


Table 2: One compartment simulations: relative bias (%), relative MSE (%) and cover¬ 
age rate (%) computed over 1000 simulations, with the intermediate meta-, the simple 
meta- and the exact mixed models. Meta-models are built with either no = 50 or 
ud = 100 design points. Coverage rate (Cov.) is the coverage rate of the 95% confi¬ 
dence interval based on the stochastic approximation of the Fisher matrix. 
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7.1 Michaelis-Menten pharmacokinetic model 

7.1.1 Simulation settings 

Let us now consider a one-compartment pharmacokinetic model, first order absorption 
and Michaelis Menten elimination. A dose D of a drug is given to a patient by intra¬ 
venous bolus. The concentration of the drug in the body along time in then described 
by the following ordinary differential equation 

Vm- f , , D r 1 ^ r, 

- t : = -—F + • TT • exp(-fca • t , /(to) = 0 

at km + J V 


where V is the volume of distribution, Vm is the maximum elimination rate (in amount 
per time unit), km is the Michaelis-Menten constant (in concentration unit) and ka is the 
absorption constant. We consider log km as fixed to —2.5. The individual parameter 
Ip consists in logV^, log ka and logWi. We assume a Gaussian distribution on the 
logarithm of these parameters with mean (piogy, Piogfca,/^iogV 4 „) = (2.5,1, —0.994) and 
a diagonal covariance matrix with terms (wfogv>^logfca’^logv™) = (0.09,0.09,0.09). 
Then a homoscedastic additive error model is simulated with a standard error = 0.1. 
We implement the four algorithms: SAEM on the original mixed model, SAEM on 
the complete, intermediate and simple mixed meta-model. For the meta-model SAEM 
algorithms, we use successively riD = 25, ud = 50 and no = 100 number of points in the 
design of experiments for the Gaussian process emulator. The covariance is Gaussian, 
and the regression functions H are linear functions. More sophisticated choices in the 
regression functions and in the kernel can be made. However, our goal in this section 
is to illustrate in a quite simple case the efficiency of the combination of the Gaussian 
process emulation with the SAEM-MCMC algorithm. In the pre-computation step, for 
a given value ^p, the ODE solver provides f{t,ip) for each time of measurement. Thus, 
the design of numerical experiment is only built over the values of ip and not t. We 
have compared two approaches one where t is considered as an additional input and 
the other where a meta-model is built for each time t. Based on the comparison of the 
quality of the approximations, we have kept the second one which is quite simple to 
deal with. However, more sophisticated approaches can be tested fsee iRoueierl . 12008 . 
for a review). The approximation is built over the domain: [1.6; 3.3] for log V, [0; 2.1] for 


log ka and [—1.6; —0.3] for log Wi- The starting values for the parameters are 


( 0 ) _ 


5^(0) — 0 5 

t^logfca — f^logVrr, 


= -0.5, 


.2(0) 

dogv 


. 2 ( 0 ) 

^log ka 


. 2 ( 0 ) 

‘^log Vr, 


= 0.1 and = 0.3. 


ogV 


= 2 , 


7.1.2 Results 

The computation times for one run of SAEM (100 iterations of SAEM, with 15 iterations 
of MCMC at each SAEM iteration) were the following: around 15 min for the exact 
mixed model (requiring solving the ODE at each iteration of MGMG), around 30 min 
for the complete mixed meta-model with = 50 (requiring inverting the C{t,xp) at 
each iteration of MGMG), around 80 sec for the intermediate model and 30 sec for the 
simple one. Therefore, in the following, we only present the results for the exact mixed 
model (as a benchmark) and the intermediate and simple mixed meta-models. 

Relative bias and relative root mean square error (RMSE) are computed for each pop¬ 
ulation parameter from 1000 replications and presented in Table [T] The 95% coverage 
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rates correspond to the coverage rate of the confidence interval on parameters based 
on the stochastic approximation of the Fisher Information matrix. In this example, 
the two meta-models have good performances even with only 25 points in the design 
of experiments and increasing no decreases the bias. The parameter wiogv;„ is biased 
when using a meta-model (whatever njj) while it is not with the exact model. This 
may be due to the error of approximation that is not completely taken into account. 


7.2 First order pharmacokinetic model 

7.2.1 Simulation settings 

Let us consider a one-compartment PK model with first order absorption and elimina¬ 
tion, with a dose I? of a drug. The concentration of the drug in the body along time in 
then described by the following ordinary differential equation 

^ = D^^exp{-kat) - kef, /(to) = 0 


where ka and kg are the absorption and elimin ation constants, Ci is the clearance. 
We consider the PK parameters of theophylliii ( Pinheiro and Batesl 200011 : log fee = 
—2.52, log/ca = 0.4, logC; = —3.22. One dataset of 36 patients is simulated with a 
dose D=6 mmol and measurements at time t = 0.25,0.5,1, 2, 3.5, 5, 7,9,12 hours. The 
random effects were simulated assuming a Gaussian distribution for the logarithm of the 
parameters with a diagonal variance-covariance matrix O with the following diagonal 
elements: wfogfee ~ ^iogfc„ = ^logC; = 0.01. Then a homoscedastic additive error model 
is simulated with a standard error = 0.1. 

The same SAEM algorithms were run as in the first example. The Gaussian process 
emulators were built with nu = 50 and no = 100 points, linear regression functions and 
a Gaussian covariance kernel, in the same fashion as in subsection 17.1.11 The domain 
where the approximation is built is [—4; —1] for log/ce, [0; 2] for \og ka and [—4.5; 2] for 
logC/. The starting values for the parameters are = —3, = IjMiogCfc = 


w- 


2 ( 0 ) 
log fcj 


= W- 


2 ( 0 ) 
log kc 


= = 0-1 = 0.3. 


7.2.2 Results 

The computation times are the same as before. Relative bias, relative RMSE and cov¬ 
erage rate computed from 1000 replications are presented in Table [2j When the design 
of numerical experiments in the pre-computation step has 100 points, the estimates 
obtained with the mixed meta-models have similar performance to the ones obtained 
with the exact mixed model. With only 50 points in the design, the estimates with the 
mixed meta-models are less accurate, especially with the intermediate mixed model. 
There is a clear improvement of the quality of the estimates with the intermediate 
mixed meta-model for the parameters concerning the means. Recall that the simple 
mixed meta-model neglects the approximation of the function /. Therefore, taking into 
account the errors of the approximation of the Gaussian process emulator in the model 
prevents from a systematic bias in the estimates. However, since the correlation be¬ 
tween the Gaussian process emulator approximation errors are set to zero for the sake 
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of simplicity, the estimation of a^- may be less accurate but usually this parameter if of 
less interest. 


8 Concluding remarks 

In the case of a mixed model where the regression function is a non-analytical solution 
of an ODE or of a PDE, we proposed to build a so-called meta-model which is obtained 
thanks to a pre-computation step. It consists in running the ODE/PDE solver on a 
well chosen design of numerical experiments. Once this meta-model is obtained, we 
use it as a surrogate of the regression function in the estimation procedure which is 
based on a SAEM-MCMC algorithm. We derived three mixed meta-models depending 
on whether the additional source of uncertainty due to the approximation by the meta¬ 
model is taken into account totally, partially or not at all (complete, intermediate and 
simple mixed meta-model). In the complete mixed meta-model, there is a full covari¬ 
ance matrix accounting for dependencies induced by the meta-modeling errors which 
slows down the SAEM-MCMC algorithm. That is why we have renounced to test it in 
the simulation study. Further works are needed to design MCMC algorithm adapted to 
this case of non-independent individuals. In the intermediate and simple mixed meta¬ 
model, the individuals are still independent thus the SAEM-MCMC algorithm does not 
suffer from any computational burden. We showed examples where even with a very 
few design points for the meta-model approximation, the estimation results are very 
satisfactory. We also showed an example where the intermediate meta-model improves 
the quality of the estimates of the parameters especially those accounting for the mean 
of the population parameters. 

Since the quality of the approximation provided by the meta-model directly depends 
on the density of the numerical design of experiments in the neighborhood of the input 
where the approximation is made, a sequential strategy for building an adaptive design 
reinforcing the meta-model where the SAEM-MCMC algorithm identifies likely region 
for the parameters should improve the estimates. However, this strategy would make 
the Markov property in the SAEM-MCMC procedure no longer to be valid. Therefore, 
there are theoretical questions which will be interesting to solve in order to ensure 
guarantees in this case. 
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Appendix 

Proof of proposition [3] 

We have 
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Therefore, we start by studying |p(y|'0;0) — p£)(y|i/); 0)|: 


(27rcre)"*°*/^|p(y|'i/’; 0 ) - PD(y|i/’; 0 )\ 

exp ^ ~ - exp ^ - mniUj 

^ ^ ij ' ^ ^ ij '' 

exp 

1 - exp f - ^ (^{yij - mD{tij,'ipi)f - {yij - j 

^ ij ' 

1 - exp ^ (^fitij,ipi) - mD{tij,ilji))(2yij - 


Under the assumption that the functions / and ttid are uniformly bounded on the 
support of i/), there exists a constant Cy which is uniform according to if) such that 
\‘^yij — /(^j'0) — 'niD(t,'ip)\ < Cy. Proposition [T] implies that the approximation error 
due to the metamodel V'i) — ’m'DiUj,'4’i)\ is controlled by inequality ([5|l: 

\f{Uj,i>i) < WfWuKGKiao) ■ 


Then there exists a constant Cy depending only on y such that 

( 27 rcr 2 )"tot/ 2 |p(y|^. 0 ) _p^(y|^;6»)| < Cy^\\f\\Ht,CK{aD). 

Finally 

\piy;O)-PDiy;0)\ < Y^p^f^'^WfW'HKGKiaD). 

□ 


8.1 Proof of proposition |4] 

We study the distance between the two likelihoods pu and po- As in Proposition [3l we 
start by studying \pD{y\'4’',0) — PD{y\'^\0)\- We consider two Gaussian distributions 
with same expectations and different covariance matrix. Thus this distance is maximum 
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when - mDiUj= 0. This yields 
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(27r)”‘“‘/^|p£,(y|i/;; 0) - P£,(y|i/j; e)| 
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Q-^tOt 
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^tot 
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Cd{Uj , ty , 


where we use that the determinant, as a product of eigen values, is smaller than a 
function of the trace of the matrix. Thus, the sum is over the diagonal of the matrix 
Cjy i.e. the sum of the variances. Then, we obtain that there exists a constant C such 
that 


\pD{y\tp;0)-PDiy\-4’;0)\ < C 


cre*°* 2cr? 


n-2 E 


^3 


< C{2^r-^^^^GKian) 

O'e 


where the last inequality holds using Proposition [T] Finally, we obtain 


\pD{y,0) -PDiy;0)\ < C, 


'^tot 
y —"n-tot +2 


Gif(a£)). 


The proof is similar for the distance between the two likelihoods pd and po- D 
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